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Abstract: A sample of 230 undergraduate psychology students rated their expectations of a 
bogus professor (who was randomly designated a man or woman and “hot” versus “not hot”) 
based on ratings and comments found on RateMyProfessors.com. Five professor qualities were 
derived using principal components analysis: dedication, attractiveness, enhancement, fairness, 
and clarity. Participants rated current actual psychology professors on the same qualities. Current 
professors were divided based on gender (man or woman), age (under 35 or 35 and older), and 
attractiveness (at or below the median or above the median). Using a multivariate analysis of 
covariance (MANCOVA), students expected hot professors to be more attractive but lower in 
clarity. They rated current professors who were male and 35 or older as lowest in clarity. Current 
professors scored significantly lower in dedication, enhancement, fairness, and clarity when rated 
at or below the median on attractiveness. Results, along with previous research, suggest 
numerous factors (largely out of professors’ control) influencing how students interpret and create 
professor ratings. Caution is therefore warranted in using online ratings to inform a variety of 
decisions, including students’ course selection or even administrators’ hiring and promotion 
decision making. 
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Introduction 

University instructors have long been regularly evaluated by their own students. In recent 
decades, opportunities for college students to express their opinions of their professors have 
expanded beyond formal university-administered group evaluations to posts on informal websites 
like RateMyProfessors.com (RMP; Johnson & Crews, 2013). RMP is a website on which 
university students may post anonymous evaluations of their professors. Since its inception in 
1999, the service has become immensely popular throughout the world; users have created over 
17 million ratings for 1.6 million professors in Canada, the United Kingdom, and the United States 
(RMP, 2016a). Sites similar to RMP operate in other countries; for example, Rate My Teachers 
for Republic of Ireland, Australia, and New Zealand (i.e., ratemyteachers.com). In addition to 
writing narrative comments about their professors, students use RMP to rate their professors on 
helpfulness, instructional clarity, and course easiness using a rating scale of 1 {low) to 5 {high). 
Scores ranging from 3.5 to 5 are considered “good,” scores ranging from 2.5 to 3.4 are considered 
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“average,” and scores ranging from 1 to 2.4 are considered “poor” (RMP, 2016b). Helpfulness 
and instructional clarity ratings are averaged to produce an overall quality score (RMP, 2016b). 
Moreover, students also rate professors as “hot” or “not hot”; a chili pepper icon appears on the 
RMP profile of professors whose aggregate hot ratings exceed their not-hot ratings (RMP, 2016c). 
While RMP does not explicitly state that the chili pepper icon represents a professor’s physical 
attractiveness, many assume this to be true (Landry, Kurkul, & Poirier, 2010). 

The impetus for RMP and similar websites is to provide students with a forum to exchange 
information about their professors and courses (RMP, 2016a). Given that the results of university- 
administered teaching evaluations are typically inaccessible to students (Kindred & Mohammed, 
2005), RMP offers students a publically available platform for the sharing of course and professor 
data. By providing potential students with otherwise inaccessible ratings from former students 
(Kindred & Mohammed, 2005), those considering the professor may inform their enrollment 
choices in hope of receiving a higher quality college education (Davison & Price, 2009; Johnson 
& Crews, 2013). Not surprisingly, concerns have been voiced about potential bias in online 
ratings, and many professors doubt the utility of sites like RMP for students truly seeking a higher 
quality education (Boswell, 2016; Davison & Price, 2009). 

Professors cite several sources of concern regarding the validity of RMP ratings (e.g., 
Davison & Price, 2009; Hartman & Hunt, 2013; Sonntag, Bassett, & Snyder, 2009). First, there is 
no guarantee that ratings have actually been posted by former students of the professor (Johnson 
& Crews, 2013; Montell, 2006; Otto, Sanford, & Ross, 2008; Timmerman, 2008). For example, 
the first and second authors both have at least once been rated on RMP for classes they have 
never taught, possibly a result of students not correctly remembering the names of the actual 
instructors. While instances such as this may produce laughter and seem fairly harmless, more 
alarming cases have been recorded involving negative postings made by rivals or disgruntled 
colleagues instead of students (see Carnevale, 2006). Second, even when postings are crafted 
by actual students, concerns remain about the validity of such postings as reflections of teaching 
quality or as windows into what potential students may expect from taking a class with a particular 
professor (Legg & Wilson, 2012). For example, students self-selecting to participate in RMP 
posting may harbor deeply felt or extreme views and may not represent a professor’s general 
student body (Boswell, 2015; Legg & Wilson, 2012). Further concerns exist about possible biasing 
factors shaping how online professor ratings are both interpreted and created. 

The purpose of the current investigation was to examine potential sources of bias in both 
interpreting and posting online professor ratings. It should be noted that there is evidence for a 
variety of sources of bias in university-administered teaching evaluations (which would 
presumably involve fewer concerns about the identity of the evaluator than RMP-type postings). 
For example, university-administered teaching evaluations may be subject to bias from whether 
the evaluated course is required versus elective (Divoky & Rothermel, 1988; Feldman, 1978; 
Patrick, 2011; Petchers & Chow, 1988; Scherr & Scherr, 1990), whether the course is higher level 
versus lower level (Goldberg & Callahan, 1991; Moritsch & Suter, 1988; Patrick, 2011), and 
whether it is a humanities or social sciences course versus a math or science course (Cashin, 
1992; Patrick, 2011). Students’ perceptions of the instructor’s personality characteristics or 
interaction style also impact evaluations, which could reflect influence on actual quality of teaching 
or on more peripheral features like likability (Ahmadi, Helms, & Raiszadeh, 2001; Clayson & 
Sheffet, 2006; Feldman, 1986; Hart & Driver, 1978; Jenkins & Downs, 2001; Patrick, 2011; 
Widmeyer & Loy, 1988; R. Wilson, 1998). Moreover, other teaching-irrelevant qualities such as 
style of dress (Eadie, 1996; Sebastian & Bristow, 2008), formality of name (e.g., title and last 
name versus first name; Sebastian & Bristow, 2008), and ability to be entertaining (Gotlieb, 2011) 
affect students’ evaluations of professors. These teaching-irrelevant qualities exert such an 
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influence upon professors’ evaluations that it is possible for students to predict how a professor 
will be evaluated simply by watching a muted video clip of the professor; entertaining individuals, 
even with no demonstrated knowledge of the topic, benefit from bias and receive higher teaching 
evaluations (Ambady & Rosenthal, 1993). 

Given that RMP provides students with a similar opportunity to offer opinions of professors, 
it is likely that RMP ratings will be affected by the same sources of bias that affect their university- 
administered counterparts. Moreover, the lack of quality control (Johnson & Crews, 2013) inherent 
in ratings on RMP and similar sites likely allows for additional sources of bias to shape the content 
of ratings, for example, course easiness (Felton, Koper, Mitchell, & Stinson, 2008; Felton, Mitchell, 
& Stinson, 2004) and professor race (Reid, 2010). 

Investigating potential sources of bias or distortion of online professor ratings is particularly 
important because, while RMP presents itself as an entertainment site, its data is commonly used 
for numerous purposes well beyond pure entertainment (Landry et al., 2010). Research has 
demonstrated that many students use such sites before signing up for classes with specific 
professors (Plossain, 2009), and exposure to online professor ratings may actually influence 
students’ expectations and motivations for their own performance in a course (Edwards, Edwards, 
Qing, & Wahl, 2007; Edwards, Edwards, Shaver, & Oaks, 2009). Exposure to positive or negative 
online ratings has even been shown to influence students’ teaching evaluations of an actual 
classroom lecture (Lewandowski, Higgins, & Nardone, 2012). Moreover, RMP evaluations may 
influence students’ in-class behaviors such as notetaking and participation in class discussions 
and activities (Kowai-Bell, Guadagno, Little, Preiss, & Plensley, 2011). Taken together, these 
findings suggest that RMP content may exert a significant influence upon students’ learning and 
academic achievement. 

In addition to its effects on students, RMP’s consequences also extend to institutions and 
professors. For example, RMP scores and narrative comments at least partially contribute to 
some college rankings (Ploward, 2014; Johnson & Crews, 2013) as well as promotion and hiring 
decisions (Johnson & Crews, 2013; Montell, 2006; Pannapacker, 2007). Professors, too, appear 
affected by online rating content in terms of their affect and self-efficacy, and the effects do not 
differ from those of reading more respected university-administered student evaluations of 
teaching (Boswell, 2016). Past research regarding targeted possible influences on interpretation 
and generation of online professor ratings will be described in detail below. 

Gender of Professor 

Gender expectations may lead students to interpret professor ratings or rate professors 
differently based on whether they are men or women. Prior research has indicated that college 
students tend to rate men professors more favorably than women professors (Abel & Meltzer, 
2007; Basow & Silberg, 1987; Joye & Wilson, 2015). This bias extends even to the online 
classroom, where students and professors do not interact in person. For example, assistant 
instructors, working under both a stereotypically male and stereotypically female pseudonym, 
were evaluated more favorably when using the male identity (MacNell, Driscoll, & Hunt, 2015). 
Other research has revealed, however, that a main effect of professor gender is likely best 
interpreted within the context of additional moderating factors. For example, college students 
evaluate women professors more harshly when they do not conform to gender-based 
expectations of helpfulness and flexibility (Bennett, 1982), and women are evaluated more 
severely than men when they have high grading standards and teach academically rigorous 
courses (Sinclair & Kunda, 2000). Other research has suggested that student raters interpret 
professor qualities differently when rating men versus women, attributing lack of clarity to low 
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effort from men professors but low ability in women professors (Stuber, Watson, Carle, & Staggs, 
2009). Furthermore, students rating a supposed applicant for a university teaching position 
appear to expect different qualities from men versus women applicants (Burns-Glover & Veith, 
1995). Based on this body of findings, the following hypotheses were included: 

Hypothesis 1: College students’ interpretations of online professor ratings will differ by 
professor gender (whether the professor is described as a man versus a woman). 

Hypothesis 2: College students will rate their own professors differently depending on 
whether they are a man versus a woman. 

Age of Professor 

In addition to evidence for gender bias, studies indicate a professor’s age also affects 
students’ evaluations of teaching (Bianchini, Lissoni, & Pezzoni, 2013; Feldman, 1983; Kinney & 
Smith, 1992). For example, Arbuckle and Williams (2003) demonstrated that students would 
evaluate a lecture more positively if they were led to believe that it was delivered by a young (i.e., 
under 35) man, supporting the notion that students expect college professors to be or at least 
resemble younger men (Messner, 2000). Moreover, in an analysis of RMP ratings, Stonebraker 
and Stone (2015) found that elevated age negatively affects students’ ratings of professors’ 
teaching; these effects begin as early as professors’ mid-forties. Other recent findings also 
indicate that students rate older professors most harshly (Joye & Wilson, 2015; J. Wilson, Beyer, 
& Monteiro, 2014; Zabaleta, 2007), possibly because of the professors’ dissimilarity to students 
who are typically in their late teens or early twenties (Gehrt, Louie, & Osland, 2015). The following 
hypotheses were generated from this evidence supporting a potential main effect of professor 
age and, based on the findings of Arbuckle and Williams (2003), a moderating effect of professor 
gender on this age effect: 

Hypothesis 3: College students will rate their own professors differently depending on 
whether they are under 35 versus 35 or older. 

Hypothesis 4: Professor age and gender will interact in influencing students’ evaluations 
of their own professors. 

While college students may interpret online professor ratings differently depending on 
whether the professor is described as younger versus older, this hypothesis was not tested in the 
current study. This was largely because of concerns that online comments on professor rating 
sites do not often include information regarding professor age. 

The Chili Pepper (or “Hotness”) 

Past studies also have indicated that professors higher in hotness (defined in various 
ways, but generally encompassing students’ subjective appraisal of professors’ physical features) 
are perceived more positively than those lacking hotness (Bonds-Raacke & Raacke, 2007; Buck 
& Tiene, 1989; Felton et al., 2004; Felton et al., 2008; Freng & Webber, 2009; Plamermesh & 
Parker, 2005; Liu, Hu, & Furutan, 2013; Riniolo, Johnson, Sherman, & Misso, 2006; Romano & 
Bordieri, 1989). Teaching at the college level may represent one of the countless situations in 
which people attribute higher degrees of socially desirable traits to attractive people but not less 
attractive individuals (Dion, Berscheid, & Walster, 1972), often summarized with the phrase “what 
is beautiful is good” (Eagly, Ashmore, Makhijani, & Longo, 1991, p. •••). 
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RMP provides information on hotness with the chili pepper feature. Professors with a chili 
pepper are hot; professors without a chili pepper lack hotness (RMP, 2016c). Research on the 
effects of the dichotomous chili pepper feature has yielded mixed results; some studies indicate 
little influence on perceptions of teaching quality (Coladarci & Kornfield, 2007) and even lack of 
user respect for its meaningfulness (Kindred & Mohammed, 2005), but others demonstrate that 
professors with a chili pepper receive more favorable ratings (Lawson & Stephenson, 2005). For 
example, hotness may impact ratings of professors’ clarity and helpfulness (Bonds-Raacke & 
Raacke, 2007). The pool of evidence, while mixed, led the authors to include the following 
hypotheses: 

Hypothesis 5: College students will interpret online professor ratings differently depending 
on whether the professor is noted to be hot or not. 

Hypothesis 6: College students will rate current professors differently depending on how 
physically attractive they find them. 

Given the evidence for self-reported females rating their instructors differently than self- 
reported males (Burns-Glover & Veith, 1995; Kohn & Hatfield, 2006), exhibiting preference for 
professors of the same gender (Gehrt et al. 2015) and being more influenced by hotness in 
evaluating professors (Liu et al., 2013), all hypotheses were tested while controlling for 
participants’ reported sex. Because some have proposed perceived similarity as the driving 
feature in age effects on professor ratings (Gehrt et al., 2015), participant age was also included 
as a control variable for both hypotheses addressing professor age. 

Method 


Participants 

The convenience sample consisted of 230 college undergraduate students enrolled in 
introductory psychology and lifespan developmental psychology classes at a public university in 
the southern United States. Participants were recruited from 18 class sections ranging in size 
from 40 to 100 students. Professors teaching these sections during the year of study recruitment 
included six men and six women. Students received course credit for participation; the option to 
participate in other studies and an alternate assignment were available to students who chose 
not to participate or did not meet inclusion requirements for this investigation. This study specified 
that participants must have been 18 years of age or older at the time of participation. 

Participants providing their sex (n = 228) included 43 males (18.70%) and 185 females 
(80.43%). The average age was 19.54 years (SD = 1.62). The sample was predominantly White 
(70.00%), with 20.43% African American, 3.91% Hispanic or Latino, 2.61% Asian, and 0.87% 
Native American, Aleut, or Aboriginal peoples. Participants’ self-reported grade point averages 
were somewhat high ( M= 3.18; SD= .60), averaging in the range of a letter grade of B. The vast 
majority of participants (91.74%) had previously visited RMP, and 76.50% reported using the site 
to make decisions about enrolling in classes at least some of the time. Out of the full sample, 218 
participants’ (41 males, 177 females) data were complete on all proposed covariates and 
independent and dependent variables and were included in final statistical analyses. The 
Institutional Review Board reviewed and approved this study prior to recruitment. 


Procedure 




www.hlrcjournal.com 


Open (Jj Access 


Data collection occurred online. Participants accessed the questionnaire using a web link 
posted by the primary investigator on the psychology department’s participant recruitment site. 
Upon opening the survey, participants were randomly assigned to view one of the different 
versions of the bogus professor’s online rating summary. Before viewing this material and follow¬ 
up questions regarding their expectations of the bogus professor, their ratings of a current actual 
professor, and demographics, participants initialed an informed consent form. They were required 
to complete the questionnaires in one session. Instructions stated that participants were allowed 
to skip any items with which they felt uncomfortable. Participants were assured that all information 
would be kept confidential and that no responses would be shared with their psychology 
professors. Data were managed by a graduate research assistant not teaching any psychology 
classes so as to maintain strict confidentiality. 

Measures 

Demographics. Participants completed closed-ended items addressing their own sex and 
race. In addition, they completed open-ended items regarding their own current age and grade 
point average on a 5-point scale (0 = F, 4 = A). 

Previous exposure to RMP. Participants reported whether they had ever visited RMP (a 
yes/no item) and how often they made decisions about enrolling in classes based on RMP ratings 
using a rating scale from 1 (never) to 5 ( always or almost always). 

Perceived demographics of current professor. Participants were asked how old they 
believed their current psychology professor to be (under 35 years old versus 35 years old or older) 
using a closed-ended item. They additionally responded as to whether their current psychology 
professor was a man or a woman. Out of the full sample of responses, 106 participants reported 
that their professor was a man, while 115 identified their professor as a woman. In addition, 95 
participants perceived their professor to be under 35, and 126 perceived their professor to be 35 
or older. 

Interpretation of online professor ratings. Because the investigators desired to learn 
more about students’ expectations from and creation of professor ratings, professor ratings were 
assessed in two distinct ways. First, to measure students’ expectations of a professor based on 
reading professor ratings, participants read an online rating of a bogus professor and then rated 
the professor on a series of teaching and personal qualities (see Table 1). The stimulus 
professor’s rating was presented in the format used on RMP; ratings were in the average range 
for helpfulness (3.2 out of 5) and in the good range for clarity (4.2 out of 5), easiness (4.0 out of 
5), and overall quality (3.8 out of 5). The researchers intended the scores to indicate neutral to 
good quality. Extreme scores were avoided so that the scores themselves would not 
overwhelmingly command attention. The description also listed whether the professor had a chili 
pepper to indicate being hot. There were four different versions of the described professor: a 
woman with no chili pepper, a woman with a chili pepper, a man with no chili pepper, and a man 
with a chili pepper. The professor was always listed with the intentionally gender-neutral name 
“Alex Johnson,” but pronouns differed between female and male versions. 

After reading the bogus RMP listing, participants were asked how often they would predict 
the professor would display a series of qualities. Items were scaled from 1 (never or almost never) 
to 4 (always or nearly always). The items presented (see Table 1) were crafted for the present 
investigation. The content of the items was generated by the investigators based on frequent 
content of their own teaching evaluations and online professor ratings. Additionally, students 
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enrolled in independent research were asked to contribute any additional items they deemed 
appropriate. 

To examine potential subcategories of teaching qualities, a principal components analysis 
(PCA) was conducted using promax rotation because items were anticipated to be correlated. 
The Kaiser-Meyer-Olkin measure of sampling adequacy was .92, and Bartlett’s test of sphericity 
was statistically significant {p = .00), indicating the data was suitable for PCA (Field, 2013). Factor 
loadings obtained from the pattern matrix are summarized in Table 1. The authors employed a 
criterion of a factor loading of .40 or higher for inclusion of an item in a particular subcategory. 
The total number of factors or subcategories was determined using the Kaiser criterion of an 
eigenvalue of at least 1.00 (Field, 2013). 

As seen in Table 1, the first factor, Dedication, accounting for 36.77% of variance, included 
items centered on the theme of professors behaving in a professional and respectful manner and 
conveying general enjoyment of teaching. The second factor, Attractiveness, explaining 12.95% 
of variance, included items about the professor’s physical appearance. The third factor, 
Enhancement, accounting for 7.69% of variance, included items referring to enrichment of 
teaching, what some might label “the little extras” in obtaining and maintaining student interest. 
Next, the fourth factor, Fairness, representing 4.02% of variance, reflected a theme of 
evenhandedness and consistency in teaching and grading. Finally, the fifth factor, Clarity, 
explaining 3.42% of variance, centered on making oneself understood by students. 

Table 1. Summary of Factor Loadings From Principal Components Analysis of Professor Rating Items 
(Promax Rotation) 


Factor 


Item 1 2 3 4 5 


Arrives on time 

.45 

.16 

-.29 

.27 

.28 

Is helpful 

.63 

-.15 

-.02 

.23 

-.09 

Is polite to students 

.67 

.08 

-.10 

.26 

-.22 

Smiles 

.48 

.23 

.03 

.30 

-.32 

Knows students by name or face 

.40 

.07 

.18 

.31 

-.32 

Exhibits respect toward students 

.57 

.03 

-.08 

.29 

.14 

Appears to be in a good mood 

.70 

.15 

.15 

-.03 

-.14 

Displays confidence 

.66 

.30 

.08 

-.22 

.06 

Seems to enjoy teaching 

.90 

-.01 

.03 

-.26 

.03 

Plands back tests and assignments 
quickly 

.73 

-.16 

.07 

-.08 

.17 

Is available outside of class 

.89 

-.19 

.04 

-.16 

.10 

Uses class time discussing relevant topics 

.70 

-.15 

.12 

-.06 

.19 

Dresses professionally 

.12 

.77 

-.25 

.11 

.16 

Is physically attractive 

-.17 

.96 

.02 

.01 

.04 

Is physically fit 

-.13 

.96 

.03 

-.03 

.04 

Plas attractive or appropriate hairstyle 

-.08 

.93 

.11 

-.04 

.03 

Wears flattering clothing 

.00 

.91 

.09 

-.04 

.00 

Uses examples from popular culture 

.11 

.07 

.71 

-.02 

.09 

Shows videos 

-.19 

.02 

.94 

.13 

.02 

Performs in-class demonstrations 

.02 

-.01 

.84 

.03 

.08 

Gives extra credit 

.33 

-.06 

.46 

.10 

-.16 

Tells funny jokes 

.22 

.21 

.55 

-.23 

.05 

Shares relevant examples in class 

.25 

.02 

.46 

-.03 

.25 
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Gives handouts 

.02 

-.13 

.65 

.25 

.06 

Flas clear grading policies 

.21 

-.06 

-.10 

.57 

.28 

Tests material covered in lectures 

.06 

.03 

-.01 

.75 

.03 

Tests material covered in readings 

-.37 

-.07 

.21 

.89 

.05 

Is consistent 

.13 

.06 

.12 

.56 

.28 

Explains topics clearly 

.08 

.10 

-.04 

.18 

.70 

Speaks loudly enough to be heard 

-.11 

.13 

.28 

.00 

.58 

Breaks down complicated topics 

.40 

-.11 

.14 

.10 

.44 


Note. Bolded values indicate inclusion in the composite variable. Factor 1 = Dedication; Factor 2 = 
Attractiveness; Factor 3 = Enhancement; Factor 4 = Fairness; Factor 5 = Clarity. 


Based on these factor loadings, five composite variables were created with the mean 
scores for included items. The mean was used in place of a sum or other calculation so as to 
maintain the item scaling of 1 (never or almost never) to 4 (always or nearly always). Descriptive 
statistics, internal consistency within factors, and correlations among the computed variables are 
listed in Table 2. As seen in Table 2, internal consistency, as assessed with Cronbach’s a, was 
high (>.70) for all but the clarity variable. This composite variable included the fewest individual 
items, and scales with fewer items often exhibit lower consistency as calculated with Cronbach’s 
formula (Peterson, 1994). 

Table 2. Descriptive Statistics and Correlations Among Variables Assessing Expectations of a Bogus 
Professor 


Item 

M(SD) 

Cronbach’s a 

1 

2 

3 

4 

1. Dedication 

2.83 (.60) 

.91 

— 




2. Attractiveness 

2.50 (1.01) 

.93 

.38** 

— 



3. Enhancement 

2.44 (.71) 

.88 

.63** 

.37** 

— 


4. Fairness 

3.13 (.65) 

.79 

.61** 

.05 

.32** 

— 

5. Clarity 

2.43 (.65) 

.65 

.62** 

.06 

.52** 

.49** 


**p<.01. 


Evaluation of current professor. Participants were also asked to evaluate their current 
psychology professor. The psychology professor was selected as the target because all 
participants were enrolled in a psychology class; however, their other coursework would vary. 
Participants rated the professor using the same items used to assess interpretations of online 
ratings of a bogus professor (see original stem items listed in Table 1). Specifically, students 
indicated how often their current professor displayed the characteristic in question using a scale 
from 1 (never or almost never) to 4 (always or almost always). For consistency, the same 
composite variables were computed. The investigators did conduct PCA with these items as well 
to ensure that similar factors emerged. Considerable overlap with the original PCA conducted 
with the responses for the bogus professor was evident in these results. Descriptive statistics, 
internal consistency, and bivariate correlations for these variables addressing evaluation of the 
current psychology professor are presented in Table 3. 

Table 3. Descriptive Statistics and Correlations Among Variables Assessing Evaluation of an Actual 
Professor 


Item 

M(SD) 

Cronbach’s 

1 

2 

3 

4 



a 





1. Dedication 

3.61 (.46) 

.90 

— 




2. Attractiveness 

2.65 (.79) 

.85 

.40** 

— 
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3. Enhancement 

3.05 (.66) 

.81 

.56** 

.54** 

— 


4. Fairness 

3.54 (.55) 

.76 

.72** 

.34** 

.48** 

— 

5. Clarity 

3.58 (.63) 

.83 

.71** 

.42** 

.52** 

.59** 


p< .01. 

Results 


Interpretation of Online Professor Ratings 

To examine hypothesized differences in participants’ expectations of a bogus professor 
based on the professor’s gender and designated hotness, a multivariate analysis of covariance 
(MANCOVA) was conducted with the five composite dependent variables (dedication, 
attractiveness, enhancement, fairness, and clarity) and two factor or grouping variables: professor 
gender (man versus woman) and hotness (chili pepper versus no chili pepper). Participant sex 
was included as a covariate. MANCOVA was conducted in place of a series of analysis of 
covariance (ANCOVA) due to significant correlation between most of the five dependent variables 
(r= .05 to r= .63; p< .01 for 8 out of 10 correlations; see Table 2). The assumption of homogeneity 
of covariances was examined with Box’s test of equality of covariance matrices, and results were 
not significant (p = .26), suggesting the assumption had not been violated. However, the 
assumption of normality did appear to be violated for all dependent variables based on significant 
Kolmogorov-Smirnov and Shapiro-Wilk tests and evident negative skew (with scores situated at 
higher values) in histograms. Because ANOVA procedures are considered robust to violations of 
the normality assumption (Field, 2013; Mertler & Vannatta, 2005), analyses proceeded, but results 
should be interpreted with caution. 

The main effect of professor gender was not statistically significant (Wilks’ X = .99; F = 
.45; p = .81); however, the main effect of hotness was statistically significant (Wilks’ X = .53; F= 
38.16; p = .00; q 2 p = .47, or large effect size). The interaction effect for Gender x Hotness was 
explored, but it was not significant. 

Follow-up tests were conducted using a series of univariate ANCOVA. For the hotness 
effect, differences at the univariate level were significant for attractiveness (F = 148.17; p = .00; 
if P = .40, or large effect size), which was not surprising. Less anticipated, there were significantly 
different expectations of professor clarity based on hotness (F= 9.80; p = .00; n 2 p = .04, or small 
effect size). Examination of the means (see Figure 1) revealed that participants expected Alex 
Johnson to be more attractive and lower in clarity when a chili pepper was included in the online 
professor rating summary presented. 
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Figure 1. Mean expectations of bogus professor designated as either hot (chili pepper) or not hot (no chili 
pepper). 

Rating of Current Professor 

Examination of the effects of professor gender (man versus woman), perceived age (under 
35 versus 35 or older), and student-rated attractiveness (at or below the median versus above 
the median) on ratings of four of the current psychology professor’s apparent qualities (dedication, 
enhancement, fairness, and clarity) also was conducted with MANCOVA, with both participant 
sex and current age in years included as covariates. For this analysis, attractiveness was 
excluded as a dependent variable because it was used as a grouping variable. The assumption 
of homogeneity of covariances was again examined with Box’s test of equality of covariance 
matrices, but results were significant (p = .00) for this analysis, suggesting the assumption had 
been violated. Box’s test is known to be highly sensitive and unequal cell sizes can increase the 
likelihood of obtaining significant results. For this analysis, ensuring equal cell sizes was difficult 
because existing professor qualities were being evaluated. Specifically, there were 104 
participants reporting having men as professors compared to 113 reporting women professors, 
and 94 participants believing their professor to be under 35 compared to 123 perceiving their 
professor as 35 or older. In addition, the assumption of normality did once again appear to be 
violated for all dependent variables based on significant Kolmogorov-Smirnov and Shapiro-Wilk 
tests and evident negative skew (with scores situated at higher values) in histograms. Because 
ANOVA procedures are considered robust to these violations (Field, 2013; Mertler & Vannatta 
2005), analyses proceeded as planned but results should still be interpreted cautiously. 

Participants’ ratings of their current psychology professor significantly differed by 
professor gender (Wilks’ X = .76; F= 13.17; p = .00; q 2 p = .24, or large effect size) and perceived 
age (Wilks’ X = .82; F= 9.39; p = .00; q 2 p = .19, or large effect size), but the interaction between 
the two variables also was statistically significant (Wilks’ X = .92; F = 3.47; p = .01; q 2 p = .08, or 






High. Learn. Res. Commun. 


Volume 6, Num. 3 | September 2016 


medium effect size); therefore, the main effects will not be interpreted further. The follow-up 
ANCOVA indicated that the significant difference was on ratings of professor clarity (F= 7.59; p 
= .01; n 2 P = -04, or small effect size). As illustrated in Figure 2, younger professors were generally 
rated higher in clarity, and men in the 35 or older category were rated as less clear than women 
in that age group. There was also a significant main effect for professor attractiveness (Wilks’ X = 
.87; F= 7.87; p = .00; r\ 2 p = .13, or large effect size). As illustrated in Figure 3 and supported by 
follow-up ANCOVA, professors considered more attractive (above the median of 2.60) were rated 
significantly higher on dedication, enhancement, fairness, and clarity. 



Figure 2. Interaction between current professor sex and apparent age in affecting students’ evaluations of 
clarity. 
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Figure 3. Mean ratings of current professor grouped by professor attractiveness. 

Discussion 

The current investigation addressed several hypotheses regarding possible sources of 
bias in college students’ interpretation and creation of online professor ratings such as those 
available on RMP. Findings revealed that online professor ratings may be swayed by a variety of 
personal features of professors, many of them being, for the most part, out of the professors’ 
control. The investigators obtained support for bias stemming from professor gender, age, and 
hotness, with specific findings not necessarily matching with previous findings. 

Gender and Age of Professor 

Results of the current investigation supported student bias in favor of women professors 
when rating an actual professor but not when interpreting or forming expectations from an online 
professor rating. Professor gender interacted with professor age when evaluating a current 
psychology professor on clarity. Specifically, male professors believed to be 35 or older received 
the lowest clarity ratings compared to younger men and all women evaluated. Given that the 
sample was largely female and young, this finding may stem from dissimilarity between 
participants and older men as professors; younger female participants may perceive older men 
as too dissimilar from themselves. This is consistent with previous research that individuals are 
more dismissive of information that they receive from others they perceive to be different from 
themselves (Wheeless, 1974). 

Additional explanations have been proposed for both gender- and age-related bias against 
professors. For example, age dissimilarity may produce a communication gap with 
undergraduates; this gap is especially wide for older faculty members who may have less 
familiarity with widely used technology and well-known examples from popular culture (Gehrt et 
al., 2015). Many digital technology-native university students (similar to the current sample) 
currently expect professors to incorporate PowerPoint slideshows, social media, and occasional 
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YouTube clips into lectures and to post supplemental material online (Borboa, Joseph, & Spake, 
2012; Griesemer, 2011; Saeed, Yun, & Sinnappan, 2009); however, older faculty members may 
have developed their preferred teaching style and format when such features were not available 
or commonplace. Studies have indicated that older faculty members are more reluctant to 
implement computer-based teaching methods (Rousseau & Rogers, 1998), and faculty members 
in general report little to no formal training in implementing enhancements such as web-based 
instruction (Vodanovich & Piotrowski, 2005). While it is completely possible to deliver a quality 
lecture using only chalk or a dry-erase marker and a mounted wall board, students nonetheless 
may prefer notes and supplements accessible through electronic devices and examples linking 
course material to the video clips, memes, and streaming programs they commonly view on the 
same devices outside of class. 

In addition to dissimilarity, the bias against older men as professors may exist because 
there are more women teaching on college campuses than in the past. Previous evidence for 
preference for professors who are men (see Basow & Silberg, 1987) may reflect an earlier time 
when women professors were more of a novelty. Current university students in the United States 
are likely exposed to a mixture of men and women as professors, potentially eroding the once 
dominant stereotype of the university professor as male authority figure. Furthermore, the higher 
number of females in the sample may have further swayed findings because women have been 
shown to particularly value female faculty over male faculty (Bachen, McLoughlin, & Garcia, 1999; 
Basow, 2000). The body of evidence suggests a complex array of influencing factors at work and 
beyond professors’ command, with physical attractiveness further expanding the list of 
advantageous characteristics professors may be fortunate to possess out of sheer luck or effort 
unrelated to teaching ability. 

Professor Hotness 

Support for bias based on professor hotness emerged as well, without all obtained results 
fitting the expectations. It was hardly shocking that participants anticipated a professor designated 
as hot to be higher in attractiveness. Such a result implies that participants noticed the chili pepper 
and believed it to be accurate. More surprising was the finding that hot professors were expected 
to be lower in clarity, suggesting a negative effect of hotness. That is, hot professors were 
expected to look good but not teach as well in at least one domain. This result is in direct 
opposition to that of Bonds-Raacke and Raacke (2007), who reported higher ratings of clarity in 
professors rated higher in attractiveness. 

The present finding is among a growing body suggesting that the “what is beautiful is 
good” stereotype may be more complex or in need of exceptions than previously believed. Other 
evidence for this includes Chia, Allred, Grossnickle, and Lee’s (1998) finding that, based on 
photographs of both attractive and unattractive men and women, unattractive men were rated 
highest in ability, while unattractive women were rated lowest in ability. In addition, Mehng (2015) 
observed a negative effect of attractiveness on perceived competence in the presence of low 
warmth. This finding may reflect what some have labeled the “beauty is beastly” effect (see 
Johnson, Podratz, Dipboye, & Gibbons, 2010). That is, there is a small body of evidence that 
attractiveness can be detrimental (for women, specifically) in some situations. In particular, 
physically attractive women are rated more negatively when applying for jobs perceived as more 
masculine in nature and for which physical appearance is deemed unimportant. The investigators 
did not find support for an interaction effect between gender and hotness, though, casting doubt 
on the “beauty is beastly” effect operating as it is typically described. Replication and further 
expansion of this research is clearly called for to better understand why college students might 
expect less clarity from professors having a chili pepper. 
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The results of professor hotness on ratings of current professors are more consistent with 
prior work. Professors rated as more attractive were rated higher in all other areas. In addition to 
the possibility that students indeed perceived more attractive professors as more competent, it is 
possible that well-liked professors were rated favorably in all areas assessed regardless of their 
actual level of hotness. This is consistent with previous research indicating that RMP 
attractiveness ratings are influenced by students’ positive illusions of professors (Theyson, 2015). 
Additionally, because the grouping for this analysis resulted from a quasi-experimental instead of 
true experimental design, the authors cannot say for certain whether attractiveness influenced 
the other professor ratings in a unidirectional manner. Furthermore, because attractiveness was 
not rated by objective outside observers, it is certainly possible that less attractive professors with 
other strong teaching qualities may have become more attractive to students over time. Still, given 
that many aspects of hotness are out of one’s control, and more so with age, results are 
concerning considering that professors not graced with hotness may be unduly penalized for a 
feature not intrinsically linked to actual teaching ability. Likewise, higher ratings assigned to 
professors lucky enough to possess pleasing physical features may place them at an advantage 
in competing for student enrollment, faculty positions, teaching awards, tenure, and promotion. 

Limitations and Future Directions 

This investigation possessed several limitations. First, the sample was limited to students 
in lower level psychology courses. Given that course level (Goldberg & Callahan, 1991; Moritsch 
& Suter, 1988; Patrick, 2011) and discipline (Cashin, 1992; Patrick, 2011) may bias student 
evaluations, results may not generalize to students taking upper level classes or those from other 
disciplines. The unequal representation of male and female students prevented comparisons 
based on participant sex, though sex was included as a covariate in all analyses. The current 
study also did not include professor race as a variable because the current psychology professors 
being evaluated were all White. Previous research suggests racial-minority faculty members are 
rated most harshly on RMP, and that Black men as professors are rated particularly negatively 
(Reid, 2010); therefore, race should be included in analyses when possible. 

Convenience sampling presents an additional limitation of the current study. Most 
participants identified as White. This may limit generalizability of results to larger, more ethnically 
diverse college groups. Future research could extend the current study to college campuses with 
greater ethnic and racial diversity in their student body. Moreover, the average participant age in 
the sample was 19 years old, seating this sample in the traditional college student age range 
(National Center for Educational Statistics, 2016). Findings from this largely traditional-aged 
sample may not generalize well to nontraditional students. This may be particularly applicable to 
findings related to technology use in the classroom; nontraditional college students may differ in 
the importance that they place on classroom technology use compared to digital-native, 
traditional-aged college students. 

Additionally, geography may limit the generalizability of the findings. Participants were 
recruited from a public university in the southern United States; however, RMP evaluations 
represent professors across the United States as well as Canada and the United Kingdom. Future 
investigations may benefit from recruitment of participants from all regions represented on RMP 
to determine if findings from an American sample generalize to student populations in other 
countries with RMP. Additionally, it is important to learn if these findings would also generalize to 
students who use similar services in countries where RMP is currently unavailable. These 
students include those who use the Rate My Teachers Republic of Ireland, Rate My Teachers 
Australia, and Rate My Teachers New Zealand websites. Despite differences in rating site used, 
these findings suggest that student ratings available through unofficial, non-university-affiliated 
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sources (e.g., RMP, Rate My Teachers, social media) have the potential to impact students’ 
perceptions of their professors. This should be taken seriously given that these perceptions, 
formed before students ever meet professors, can impact students’ motivations for a course and 
course-related behavior (Edwards et al., 2007; Kowai-Bell et al., 2011). 

Because the measures used in hypothesis testing were developed specifically for this 
study, it is optimal for replication to take place, including attempts to more firmly establish reliability 
and validity of measures. Alternate means of measuring relevant variables should be explored, 
too. For example, when examining the effect of hotness on evaluations of current professors, it 
would be desirable to have trained outside coders objectively rate physical attractiveness. Another 
key feature to assess would be professors’ actual teaching style, vocal quality, organization, and 
evident personality features. Again, this information would best be assessed by objective coders 
unfamiliar with the professors. 

Implications 

Taken together, and keeping these limitations in mind, these findings indicate a 
multifaceted and complicated association between teaching competence and student 
evaluations. Moreover, they suggest cause for concern regarding the widespread and increasing 
use of online professor ratings by students and university administration. These seemingly fun 
ratings have the potential to influence course enrollment, sway expectations of students who do 
enroll in a given course, and tarnish a professor’s reputation. Alarmingly, these negative 
consequences may not result from poor teaching so much as largely uncontrollable factors such 
as a professor not being the preferred gender, not looking or acting young enough, or not being 
as easy on the eyes as peers or competitors. Sites like RMP may rate professors, but findings 
like these beg us to ask if they truly do rate teaching or have any appropriate place in students’ 
or university administrators’ decision making. 

Despite these concerns about the validity of RMP ratings, this remains: They can and 
sometimes do influence decision making. For example, students perceive these evaluations as 
credible tools to inform their education-related decisions (Field, Bergiel, & Viosca, 2008; Davison 
& Price, 2009; Playes & Prus, 2014; Landry et al., 2010). The influence of RMP-style ratings on 
students’ decision making suggests that faculty members should not be hasty and completely 
reject their content, despite the ratings’ biases. Further supporting the case that RMP content 
merits some faculty attention is evidence that RMP narrative content is pertinent to teaching (Otto 
et al., 2008) and focused on instruction-related characteristics such as content knowledge, clarity 
in communication, and organization (Plartman & Hunt, 2013; Kindred & Mohammed, 2005; Silva 
et al., 2008). 

Although RMP ratings are not intended to provide formative feedback to faculty, some 
may be nonetheless interested in exploring their ratings. We suggest that faculty members 
seeking to garner formative information from RMP ratings approach them with similar techniques 
utilized for formal, university-administered student evaluations. For example, Buskist and Flogan 
(2010) recommend a systematic approach to interpretation of student ratings. First, remove all 
comments that are irrelevant to course content or teaching (e.g., “Her clothes are ugly”). Next, 
remove all comments that lack concrete, specific information about teaching or the course (e.g., 
“She is super” or “I think it’s stupid that we have to take these classes to graduate”). Then, group 
comments into two categories: (a) aspects of teaching and course content that can be changed 
(e.g., “Assignments would work better in a different sequence”) and (b) aspects of teaching and 
course content that cannot be changed (e.g., “This subject has a lot of technical information”). 
Aspects of teaching and content that can be changed may also be further categorized as (i) things 
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useful to change and (ii) things not useful to change. For example, comments such as “I hate that 
we have to do outside reading” reflect pedagogically useful components of a course despite their 
unpopularity with some students. 

In addition to their influence upon student decision making, RMP ratings may also 
influence administrator decision making regarding hiring, tenure, and promotion. We caution, 
however, their use in this way. Given that course easiness influences RMP’s overall quality ratings 
(Felton et al., 2008), it may be tempting to “water down” or reduce the rigor of one’s course to 
improve one’s overall quality ratings. This concern regarding reduced rigor extends to the results 
of formal, university-administered student evaluations of teaching as well (Zabaleta, 2007). When 
the results of any form of student evaluation, formal or informal, are used in summative hiring, 
tenure, and promotion decisions, it is important to include other assessments of teaching 
effectiveness, for example, peer and supervisor teaching evaluations as well as teaching 
portfolios (Marsh, 1984; Marsh & Roche, 1997; Zabaleta, 2007). 
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